Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection
نویسنده
چکیده
Abstract. Analyzing high-dimensional data poses many challenges due to the “curse of dimensionality”. Not all high-dimensional data exhibit these characteristics because many data sets have correlations, which led to the notion of intrinsic dimensionality. Intrinsic dimensionality describes the local behavior of data on a low-dimensional manifold within the higher dimensional space. We discuss this effect, and describe a surprisingly simple approach modification that allows us to reduce local intrinsic dimensionality of individual points. While this unlikely will be able to “cure” all problems associated with high dimensionality, we show the theoretical impact on idealized distributions and how to practically incorporate it into new, more robust, algorithms. To demonstrate the effect of this adjustment, we introduce the novel Intrinsic Stochastic Outlier Score (ISOS), and we propose modifications of the popular t-Stochastic Neighbor Embedding (t-SNE) visualization technique for intrinsic dimensionality, intrinsic t-Stochastic Neighbor Embedding (it-SNE).
منابع مشابه
Visualizing and Exploring Dynamic High-Dimensional Datasets with LION-tSNE
T-distributed stochastic neighbor embedding (tSNE) is a popular and prize-winning approach for dimensionality reduction and visualizing highdimensional data. However, tSNE is non-parametric: once visualization is built, tSNE is not designed to incorporate additional data into existing representation. It highly limits the applicability of tSNE to the scenarios where data are added or updated ove...
متن کاملHeavy-Tailed Symmetric Stochastic Neighbor Embedding
Stochastic Neighbor Embedding (SNE) has shown to be quite promising for data visualization. Currently, the most popular implementation, t-SNE, is restricted to a particular Student t-distribution as its embedding distribution. Moreover, it uses a gradient descent algorithm that may require users to tune parameters such as the learning step size, momentum, etc., in finding its optimum. In this p...
متن کاملStochastic neighbor embedding (SNE) for dimension reduction and visualization using arbitrary divergences
We present a systematic approach to the mathematical treatment of the t-distributed stochastic neighbor embedding (t-SNE) and the stochastic neighbor embedding (SNE) method. This allows an easy adaptation of the methods or exchange of their respective modules. In particular, the divergence which measures the difference between probability distributions in the original and the embedding space ca...
متن کاملIntrinsic Geometry Visualization for the Interactive Analysis of Brain Connectivity Patterns
Understanding how brain regions are interconnected is an important topic within the domain of neuroimaging. Advances in non-invasive technologies enable larger and more detailed images to be collected more quickly than ever before. These data contribute to create what is usually referred to as a connectome, that is, a comprehensive map of neural connections. The availability of connectome data ...
متن کاملDoubly supervised embedding based on class labels and intrinsic clusters for high-dimensional data visualization
Visualization of data can assist decision-making processes by presenting the underlying information in a perceptible manner. Many dimension reduction techniques have been proposed to generate faithful visualization snapshots given high-dimensional data. When class labels associated with the data are already provided, supervised dimension reduction methods, which utilize such pre-given label inf...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017